Overview
What is Apache Hive?
Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.
With Apache Hive, you can enter the world of Big Data
Best Distributed Database in the market
Help your dev team !
Spectacular SQL-like interface for accessing Hadoop
This system makes active data of value.
Best query platform for ETL.
It is an advance to the ease of the processes
Capabilities of Apache Hive
Excellent bigdata warehouse solution
Our use …
very useful for OLTP
Apache Hive
Walk into the World of Big Data with Apache Hive
Reliable and Cheaper one stop Data warehouse solution
Big Data the SQL way
Apache Hive: Big data querying tool w/SQL interface, but slower, more costly computation
Awards
Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards
Pricing
What is Apache Hive?
Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.
Entry-level set up fee?
- No setup fee
Offerings
- Free Trial
- Free/Freemium Version
- Premium Consulting/Integration Services
Would you like us to let the vendor know that you want pricing?
24 people also want pricing
Alternatives Pricing
What is ClicData?
ClicData is a 100% cloud-based business intelligence platform that allows users to connect, process, blend, visualize and share data from a single place. As an automated platform, users are able to rely on the latest version of company data, to ensure users make the right decisions. Hundreds of…
What is retailMetrix?
RetailMetrix is a data analytics platform for retailers with the mission of enabling retailers to get value from their data. RetailMatrix processes and stores sales, labor and customer data using data warehouse technologies. Its dashboards and reports allows team to find the data that matters to…
Product Demos
Apache Hive Hadoop Ecosystem - Big Data Analytics Tutorial by Mahesh Huddar
Connecting Microsoft Power BI to Apache Hive using Simba Hive ODBC driver
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Product Details
- About
- Tech Details
- FAQs
What is Apache Hive?
Apache Hive Technical Details
Operating Systems | Unspecified |
---|---|
Mobile Application | No |
Frequently Asked Questions
Comparisons
Compare with
Reviews and Ratings
(97)Community Insights
- Business Problems Solved
Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.
Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.
Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.
Attribute Ratings
Reviews
(1-25 of 30)- Reduce-based query language with a simple query language.
- Parallelism across a distributed system is provided.
- All cloud platforms have access to a tabular format and interfaces.
- Due to the shuffled data, complex joins may take a long time to complete.
- Execution is dependent on external storage and memory.
Best Distributed Database in the market
- It is easy to store the data that are unstructured
- Easy to retrieve using SQL queries instead of other complicated way
- Large set of data can be stored efficiently
- Apache Hive can provide more flexibility on the Integration.
Help your dev team !
- Simplify query to devs
- Organize data
- Batch process
- Deploy
- Maintenance
- Support
Spectacular SQL-like interface for accessing Hadoop
- Easy-to-use, interactive modern layout
- Easy to organize data and view tables and views from across the organization
- Fast speed for most queries
- Some queries, particularly complex joins, are still quite slow and can take hours
- Previous jobs and queries are not stored sometimes
- Switching to Impala can sometimes be time-consuming (i.e. the system hangs, or is slow to respond).
- Sometimes, directories and tables don't load properly which causes confusion
Best query platform for ETL.
It is an advance to the ease of the processes
- The unification of the data will help to establish the commercial criteria.
- We are sure that the data is protected
- If you try to extract an excessive amount of data, the system will become slow
- You may have the danger that the system collapses due to the amount of data
Capabilities of Apache Hive
- It can be used to retrieve data from database like SQL.
- We can partition the data and distribute amongst the clustered machines
- Easily scalable, which gives capability of running analytics at a larger level
- No support for working with Unstructured data.
- ACID properties are not followed like database which creates confusion many times
- Support OLAP environment only, OLTP is not supported
Excellent bigdata warehouse solution
Our use case/scope is to work on a large data analytics project where the data frequency and velocity are very high. Apache Hive is very useful in processing both the unstructured and structured data in a seamless way. It help us in reducing to write complex queries as it is targeted to the SQL queries, we have a engineer team who are very proficient in writing SQL queries with the help of Apache Hive to process the big data.
We have identified no business issues using the solution.
- Apache Hive supports external data tables.
- Supports data partitioning to improve overall performance.
- Apache hive is reliable and scalable solution.
- Apache Hive supports writing ad-hoc queries as well.
- Apache hive is not best suited for OLTP based jobs.
- Sometimes we observed high latency rate while querying data.
- Limitations on providing row-level data update.
- Training materials needs improvements.
The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.
very useful for OLTP
- Used in data warehouse like similar to ETL tools.
- Interface like SQL give data stored in various db group.
- Enables analytics at massive scale.
- Way of framework development can be improved.
- OLTP is not supported.
- Does not offer real time queries.
Apache Hive
- Apache Hive is fault-tolerant.
- Apache Hive's latest version supports ACID transactions.
- Apache Hive supports UPDATE, DELETE and MERGE.
- Apache Hive should support ROLLBACK, COMMIT operations.
- Apache Hive should support XML SerDe.
- Apache Hive.
Walk into the World of Big Data with Apache Hive
- Simple query language built on top of Ma reduce paradigm.
- Provides parallel execution over distributed system.
- Tabular format and connectors available for all cloud platforms.
- Complex joins may take time to execute due to shuffling of data.
- Static queries mostly.
- Slower than Apache Spark by almost 100 times.
- Dependent on external memory and storage to execute.
Big Data the SQL way
- The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
- I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
- I particularly liked the UDF functionality where the user could define functions to produce particular output.
- Transactions are not supported
- Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
- It is not as fast as spark.
On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.
Apache Hive: Big data querying tool w/SQL interface, but slower, more costly computation
- Flexibility through schema on read
- Familiar SQL like query language
- Functions for complex queries and analysis
- Slower processing than other tools on the market
Hive: When SQL marries with Hadoop
- The SQL, like query interface, is the core value and shining core of the Hive.
- It supports various data formats stored and also allows indexing.
- It is fast.
- No transaction support.
- No sub-query support.
- Can only deal with the cold data (non-real time).
Manage data for your warehouse as strong as a beehive using Apache HIve!
It was one of those technical sessions and I was supposed to demonstrate a word count program of a novel downloaded from the Project Gutenberg. I was successfully able to download the novel, load it into the Hadoop platform and execute a HiveQL (a SQL similar syntax used by Apache Hive) query to demonstrate for few unique words, their count, and related examples.
- The capability to handle large amounts of data and its querying process.
- A syntax similar to SQL is an added advantage.
- An active developer support and community always ready to help.
- Ease of usage.
- Resource consuming sometimes. May be that I was using a larger object file.
- Needs to add an update or a modify functionality. This has to be the minimilastic CRUD requirement.
The only underlying problem could be that the Apache Hive is designed to run on the Apache Hadoop ecosystem. People who are not comfortable using a Linux tree structure based File System or even people who are not likely to use a Linux OS might not like to use Hive.
Reliable, cheap and trustworthy!
- Reading databases
- Writing databases
- Storing databases
- Distributed databases
- Improvement techniques for handling Relational Data
- Advanced optimizations
- Transactions memory
Apache Hive: SQL, open-source querying tool
- Monitor query performance
- Manage tables in the data warehouse
- Uses standard SQL
- UI is quite dated and not intuitive
- Open-source, so does not have consistent updates or support
- Not the most optimal for ETL processes
My Apache Hive Review
- Querying in Apache Hive is very simple because it is very similar to SQL.
- Hive produces good ad hoc queries required for data analysis.
- Another advantage of Hive is that it is scalable.
- Apache Hive isn't designed for and doesn't support online processing of data.
- Sub queries not supported.
- Updating the data can be a problematic task.
Hive is solid data analytical tool
- It's Fast!
- You can store a different kind of data structures here other than the standard ones
- Good scalability
- Good redundancy too
- It's not as ACID compliant as an RDBMS. It's a recently added feature and still needs work.
- This is not the tool to go for online data processing.
- It does not support sub-queries.
- It can't process data in real time.
Its good for fast query processing, for storing large amounts of data.
Hive - SQL-like query engine for big data platform
- Querying, joining and aggregating data
- In built-in and user-defined functions
- Speed
- Support for other big data frameworks like Spark
- Need better user interfaces for browsing datastores and querying
One of the first SQL on Hadoop tools. Perhaps not the best.
- One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
- Hive Live Long and Process has made recent significant improvement on long-running queries.
- Allows BI tools to run analysis over Hadoop data.
- Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.
- Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
- Overall speed of ad-hoc querying could be improved.
Bringing Structure to your Unstructured Data
2. Events are gathered in HDFS by flume and needs to be processed into parquet files for fast querying. The input data contains variable attributes in the json payload as each customer could define custom attributes.
- Hive syntax is almost like SQL, so for someone already familiar with SQL it takes almost no effort to pick up Hive.
- To be able to run map reduce jobs using json parsing and generate dynamic partitions in parquet file format.
- Simplifies your experience with Hadoop especially for non-technical/coding partners.
- Hive doesn't support many features that traditional RDBMS SQL has; so it may not be an easier transformation as one would presume.
- Being OpenSource, it has its share of problems and lack of support; need to explore community groups to get some clarifications if you are not using any of the big distribution providers like Cloudera or HW.
- Hive is comparatively slower than its competitors. It's easy to use but that comes with the cost of processing. If you are using it just for batch processing then Hive is well and fine.
We are trying to mine data from massive data sets for a wide variety of purposes (debugging production issues, creating business metrics, models, and forecasts among other things). We have been able to do this very easily using our data warehouse and a combo of Hive and Pig. Makes it simpler for your BA's as they are familiar with SQL, and can adapt to Hive without too much of technical knowhow.
Apache Hive for ETL workloads
- Hive is good for ETL workloads on Hadoop.
- HiveQL translates SQL like queries into map reduce jobs.It supports custom map reduce scripts to plugged in.
- Hive has two kinds of tables- Hive managed tables and external tables.
- Use external table when other applications like pig, sqoop or mapareduce also using the file in hdfs. Once we delete the external table from Hive, it just deletes the metadata from Hive and original file in hdfs stays.
- Use Hive for analytical work loads. Write once and read many scenarios. Do not prefer updates and deletes.
- Behind scenes Hive creates map reduce jobs. Hive performance is slow compared to Apache Spark.
- Map reduce writes the intermediate outputs to dial whereas Spark operates in in-memory and uses DAG.
Apache Hive - Querying Big Data Made Easy!
Apache Hive solves a few issues for us but the main one being the ability to analyze large volumes of data on S3 directly with overall strong performance. We have been able to analyze billions of records in a matter of minutes with relatively small EC2 cluster using Apache Hive. It also allows for our Data Analysts to simply write SQL and avoids the ramp up to use other tools such as Apache Pig.
- Apache Hive allows use to write expressive solutions to complex problems thanks to its SQL-like syntax.
- Relatively easy to set up and start using.
- Very little ramp-up to start using the actual product, documentation is very thorough, there is an active community, and the code base is constantly being improved.
- Debugging can be messy with ambiguous return codes and large jobs can fail without much explanation as to why.
- Hive is only SQL-like, while more features are being added we have found that some things do not translate over (for example outer joins, inserts, columns can only be referenced once in a select, etc.).
- For out ETL jobs it does not seem to be the optimal tool due to tunings and performance being difficult, Apache Pig may be better for heavy processing jobs.
Hive Away, but not for everything!
- Hive which leverages traditional MapReduce at the core, can be used to process a large amount of data without a problem. Any problem that can be solved with MapReduce can now be simply expressed in SQL.
- Hive leverages the disk in the case of processing large data and is not limited by physical memory of any one machine (which is a limitation for systems like Presto). Hence it even allows reasonable fact-fact cross joins.
- Hive is extensible with UDFs. For any common patterns you can quickly write your own function set and it can be leveraged by everyone.
- SQL syntax of hive is unique and does not conform to ANSI SQL. This is quite painful for beginners.
- The ability to upsert records would be nice to have. Hive is cumbersome for mutable data where partitions require them to be rewritten. No one has solved this really well. If this is solved - it could be leveraged by many systems.